Semantic Similarity Measures for Malay Sentences

نویسندگان

  • Shahrul Azman Mohd. Noah
  • Amru Yusrin Amruddin
  • Nazlia Omar
چکیده

The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participate. Semantic text similarity was made possible with the availability of resources in the form of semantic lexicon such as the WordNet for English and GermaNet for German. However, for languages such as Malay, text similarity proved to be difficult due to the unavailability of similar resources. This paper, however, describe our approach for text similarity in Malay language. We used a preprocessed Malay dictionary and the overlap edge counting based method to first calculate the word-to-word semantic similarity. The word-to-word semantic similarity measure is then used to identify the semantic sentence similarity using a modified approach for English language. Results of the experiments are very encouraging, and indicate the potential of semantic similarity measure for Malay sentences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Semantic Similarity Measure between Sentences

The purpose of this paper is to present a mathematical model for estimating semantic similarity among sentences in texts. The similarity measure is constructed from the semantic similarity among concepts and a set of concepts. Based on this model, we develop algorithms to calculate the semantic similarity between two set of concepts and then the ones to estimate the semantic similarity between ...

متن کامل

Addressing the Variability of Natural Language Expression in Sentence Similarity with Semantic Structure of the Sentences

In this paper, we present a new approach that incorporates semantic structure of sentences, in a form of verb-argument structure, to measure semantic similarity between sentences. The variability of natural language expression makes it difficult for existing text similarity measures to accurately identify semantically similar sentences since sentences conveying the same fact or concept may be c...

متن کامل

CFILT-CORE: Semantic Textual Similarity using Universal Networking Language

This paper describes the system that was submitted in the *SEM 2013 Semantic Textual Similarity shared task. The task aims to find the similarity score between a pair of sentences. We describe a Universal Networking Language (UNL) based semantic extraction system for measuring the semantic similarity. Our approach combines syntactic and word level similarity measures along with the UNL based se...

متن کامل

CFILT-CORE: Finding Semantic Textual Similarity using UNL

Semantic Textual Similarity is the task of finding the degree of similarity between a pair of sentences through semantics extraction. This is motivated by the fact that syntactically diverse sentences often convey the same meaning. This paper describes the approach that was used in the *SEM Shared Task 2013. The approach combines semantic, syntactic and lexical similarity measures for finding s...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007